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the project was to provide a useful tool to aid in the tuning of the monitor's scheduling 
algorithm. As the simulation evolved, the model served as a vehicle to study proposed 
hardware changes. 
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Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks 
to improve aggregate I/O performance. Today they appear in the product lines of most 
major computer manufacturers. This article gives a comprehensive overview of disk arrays 
and provides a framework in which to organize current and future work. First, the article 
introduces disk technology and reviews the driving forces that have popularized disk 
arrays: performance and reliability. It discusses the tw ... 

Keywords: RAID, disk array, parallel I/O, redundancy, storage, striping 
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The code that initializes a system can be notoriously difficult to understand. In secure 
systems, initialization is critical for establishing a starting state that is secure. This paper 
explores two architectures used for bringing an operating system to its initial state, once 
the operating system gains control from the boot loader. Specifically, the ways in which 
the OpenBSD and Linux operating systems handle initialization are dissected. 
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The Sprite network operating system uses large main-memory disk block caches to 
achieve high performance in its file system. It provides non-write-through file caching on 
both client and server machines. A simple cache consistency mechanism permits files to be 
shared by multiple clients without danger of stale data. In order to allow the file cache to 
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occupy as much memory as possible, the file system of each machine negotiates with the 
virtual memory system over physical memory usage and c ... 
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Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
required to provide acceptable performance. The advent of object-oriented and extensible 
database systems will not solve this problem. On the contrary, modern data models 
exacerbate the problem: In order to manipulate large sets of complex objects as efficiently 
as today's database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 
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This paper presents an overview of the Cedar programming environment, focusing on its 
overall structure— that is, the major components of Cedar and the way they are organized. 
Cedar supports the development of programs written in a single programming language, 
also called Cedar. Its primary purpose is to increase the productivity of programmers 
whose activities include experimental programming and the development of prototype 
software systems for a high-performance personal computer. T ... 
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Most multitasking operating systems support scheduling priorities in order to ensure that 
processor time is allocated to important or time-critical processes in preference to less 
important ones. Ideally this would prevent a low-priority process from slowing the 
execution of a high-priority one. In practice, strict prioritisation is undermined by a lack of 
suitable allocation policy for resources other than CPU time. For example, a low priority 
process may degrade the execution speed of a high-p ... 
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While much current research concerns multiprocessor design, few traces of parallel 
programs are available for analyzing the effect of design trade-offs. Existing trace 
collection methods have serious drawbacks: trap-driven methods often slow down 
program execution by more than 1000 times, significantly perturbing program behavior; 
microcode modification is faster, but the technique is neither general nor portable. This 
paper describes a new tool, called MPTRACE, for collecting tr ... 
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Energy consumption has recently been widely recognized as a major challenge of 
computer systems design. This paper explores how to support energy as a first-class 
operating system resource. Energy, because of its global system nature, presents 
challenges beyond those of conventional resource management. To meet these challenges 
we propose the Currentcy Model that unifies energy accounting over diverse hardware 
components and enables fair allocation of available energy among applications. Our par ... 
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Thor is an object-oriented database system designed for use in a heterogeneous 
distributed environment. It provides highly-reliable and highly-available persistent storage 
for objects, and supports safe sharing of these objects by applications written in different 
programming languages. Safe heterogeneous sharing of long-lived objects requires 
encapsulation: the system must guarantee that applications interact with objects only by 
invoking methods. Although safety concerns are important, most obj ... 
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1 Ener g y-aware computin g : Exposin g disk layout to compiler for reducin g ener gy Q 
consumption of parallel disk based systems 
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Publisher: ACM Press 
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Disk subsystem is known to be a major contributor to overall power consumption of high- 
end parallel systems. Past research proposed several architectural level techniques to 
reduce disk power by taking advantage of idle periods experienced by disks. While such 
techniques have been known to be effective in certain cases, they share a common 
drawback: they operate in a reactive manner; i.e., they control disk power by observing 
past disk activity (e.g., idle and active periods) and estimating futu ... 

Keywords: low power, optimizing compiler, parallel disk 
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Many factors have contributed to the birth and continued growth of mobile computing, 
including recent advances in hardware and communications technology. With this new 
paradigm however come new challenges in computer operating systems development. 
These challenges include heretofore relatively unusual items such as frequent network 
disconnections, communications bandwidth limitations, resource restrictions, and power 
limitations. It is the last of these challenges that we shall explore in this p ... 
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The Logical Disk (LD) defines a new interface to disk storage that separates file 
management and disk management by using logical block numbers and block lists. The LD 
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interface is designed to support multiple file systems and to allow multiple 
implementations, both of which are important given the increasing use of kernels that 
support multiple operating system personalities. A log-structured implementation of LD 
(LLD) demonstrates that LD can be implemented efficiently. LLD adds about 5% to 10% ... 



Keywords: MINIX, UNIX, disk storage management, file system organization, file system 
performance, high write performance, log-structured file system, logical disk 
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Designing clusters of PCs for distributed databases processing OLAP(On Line Analytical 
Processing) workloads in parallel with good scalability remains a particular challenge as we 
are lacking a deep understanding of the architectural issues around resource usage by 
standard DBMSs on distributed platforms.To address this problem, we present a novel 
performance monitoring framework for filtering and abstracting samples of performance 
data from low level counters into a high level performance pictu ... 

Keywords: cluster of PCs, distributed OLAP processing, parallel databases, performance 
analysis, workload characterization 
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This article describes the Digital Continuous Profiling Infrastructure, a sampling-based 
profiling system designed to run continuously on production systems. The system supports 
multiprocessors, works on unmodified executables, and collects profiles for entire systems, 
including user programs, shared libraries, and the operating system kernel. Samples are 
collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low 
overhead (1-3% slowdown for most workloads). A ... 

Keywords: performance understanding, performance-monitoring hardware, profiling, 
program analysis 
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The development of new large scale time-sharing systems has raised a number of 
problems for computation center management. Not only is it necessary to develop an 
appropriate hardware configuration for these systems, but appropriate software 
adjustments must be made. Unfortunately, these systems often do not respond to changes 
in the manner that intuition would suggest, and there are few guides to assist in the 
analysis of performance characteristics. The development of a comprehensive simul ... 
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With the advent of real-time and goal-oriented database systems, priority scheduling is 
likely to be an important feature in future database management systems. A consequence 
of priority scheduling is that a transaction may lose its buffers to higher-priority 
transactions, and may be given additional memory when transactions leave the system. 
Due to their heavy reliance on main memory, hash joins are especially vulnerable to 
fluctuations in memory availability. Previous studies have propose ... 
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In decision support applications, the ability to provide fast approximate answers to 
aggregation queries is desirable. One commonly-used technique for approximate query 
answering is sampling. For many aggregation queries, appropriately constructed biased 
(non-uniform) samples can provide more accurate approximations than a uniform sample. 
The optimal type of bias, however, varies from query to query. In this paper, we describe 
an approximate query processing technique that dynamically constructs ... 
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In this paper, we show that shared virtual memory, in a shared-nothing multiprocessor, 
facilitates the design and implementation of parallel join processing algorithms that 
perform significantly better in the presence of skew than previously proposed parallel join 
processing algorithms. We propose two variants of an algorithm for parallel join processing 
using shared virtual memory, and perform a detailed simulation to investigate their 
performance. The algorithm is unique in that it employ ... 
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In previous work, the first author argued for simple lightweight visualisations. These are 
surprisingly complex to produce due to the need for infrastructure to read files, etc. 
onCue, a desktop 'agent', aids the rapid production of such visualisations and their 
integration with desktop and Internet applications. Two examples are used dancing 
histograms for 2D tables and pieTrees for hierarchical numeric data. A major focus is the 
importance of architecture, both that of onCue itself and th ... 

Keywords: Internet— desktop integration, artificial intelligence, hierarchical data, 
interactive visualisation, software architecture 
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We present a pipelining, dynamically tunable reorder operator for providing user control 
during long running, data- intensive operations. Users can see partial results and 
accordingly direct the processing by specifying preferences for various data items; data of 
interest is prioritized for early processing. The reordering mechanism is efficient and non- 
blocking and can be used over arbitrary data streams from files and indexes, as well as 
continuous data feeds. We also investigate severa ... 

Keywords: Informix, Interactive data processing, Online reordering, User control 
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A number of multiversion concurrency control algorithms have been proposed in the past 
few years. These algorithms use previous versions of data items in order to improve the 
level of achievable concurrency. This paper describes a simulation study of the 
performance of several multiversion concurrency control algorithms, investigating the 
extent to which they provide increases in the level of concurrency and also the CPU, I/O, 
and storage costs resulting from the use of multiple versions. T ... 
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By providing direct data transfer between storage and client, network-attached storage 
devices have the potential to improve scalability for existing distributed file systems (by 
removing the server as a bottleneck) and bandwidth for new parallel and distributed file 
systems (through network striping and more efficient data paths). Together, these 
advantages influence a large enough fraction of the storage market to make commodity 
network-attached storage feasible. Realizing the technology's ful ... 
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The I/O behavior of some scientific applications, a subset of Perfect benchmarks, 
executing on a multiprocessor is studied. The aim of this study is to explore the various 
patterns of I/O access of large scientific applications and to understand the impact of this 
observed behavior on the I/O subsystem architecture. I/O behavior of the program is 
characterized by the demands it imposes on the I/O subsystem. It is observed that implicit 
I/O or paging is not a major problem for the applicatip ... 
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Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks 
to improve aggregate I/O performance. Today they appear in the product lines of most 
major computer manufacturers. This article gives a comprehensive overview of disk arrays 
and provides a framework in which to organize current and future work. First, the article 
introduces disk technology and reviews the driving forces that have popularized disk 
arrays: performance and reliability. It discusses the tw ... 

Keywords: RAID, disk array, parallel I/O, redundancy, storage, striping 
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We consider a cluster architecture in which dynamic content is generated by a database 
back-end and a collection of Web and application server front-ends. We study the effect of 
transparent query caching on the performance of such a cluster. Transparency requires 
that cached entries be invalidated as a result of writes. We start with a coarse-grain table- 
level automatic invalidation cache. Based on observed workload characteristics, we 
enhance the cache with the necessary dependency tracking and ... 
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Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
required to provide acceptable performance. The advent of object-oriented and extensible 
database systems will not solve this problem. On the contrary, modern data models 
exacerbate the problem: In order to manipulate large sets of complex objects as efficiently 
as today's database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 



20 Scale and performance in a distributed file system 

John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, 
Robert N. Sidebotham, Michael J. West 
February 1988 ACM Transactions on Computer Systems (TOCS), volume 6 issue l 

Publisher: ACM Press 

Full text available:-^ pdf(2.38 MB) Additional Information: full citation, abst^ct, references, citings, index 
^ terms, review 

The Andrew File System is a location-transparent distributed tile system that will 
eventually span more than 5000 workstations at Carnegie Mellon University. Large scale 
affects performance and complicates system operation. In this paper we present 
observations of a prototype implementation, motivate changes in the areas of cache 
validation, server process structure, name translation, and low-level storage 
representation, and quantitatively demonstrate Andrews ability to scale gracefully. W ... 
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